This article is about the network analysis for player transfers within the website "www.transfermarkt.it". The goal of this report is to represent through Network Analysis the market transfers of the last 30 years of the top 6 leagues in Europe, followed by an in-depth look at the Italian leagues, asking questions that would help a sports agent to meet the demands of his clients. This topic was chosen because, while network analysis is often used to analyse data from football matches, it has rarely been used to analyse market transfers. The few studies or theses found online only aim to describe the development of past transfers, without focusing on possible uses to help football agents.
The study was conducted by considering both the number of transfers and economic flows.
We chose the Transfermarkt site because it gives us a lot of information for free, which is great for network analysis.
Besides creating a script to extract the data, we made use of the information on the repository: https://github.com/emordonez/Transfermarkt.
Since football agent already have data on matches and players available through paid sites, they wanted to provide an additional tool, that of the football transfer market, to improve both a player's career and fooball agent's monetary income.
We have proceeded by asking certain questions that will be summarized in these points and then discussed further:
1) What are the top teams, leagues and connections between them of the last 30 years and how have they varied each decade in terms of the number of trades in the 6 leagues? And which one plays a central role?
2) Which top European teams move more money between them? And which teams play a central role?
3) What roles and ages are bought the most in these leagues in terms of number of transfers and monetary exchanges?
1) Which are the most active teams in Italy in term of number of tranfers from 2010?
2) What is the best minimal path from team A to a team B, in terms of number of weighted transfers? What is the best minimal path from team A to team B that provides the highest monetary revenue?
3) What nationalities of players do certain clubs prefer from 2010 in term of tranfer's number?
The project was carried out using the Python programming language, while the IDE used was PyCharm, which also allowed us to work on Jupyter.
The most used Python libraries for the analysis are:
Pandas --> manage and organize data with databases or Dataframes.
Networkx --> building the graph structure.
Matplotlib --> graphical part.
Plotly --> creating the dynamic graph.
Sklearn --> tools for predictive data analysis
Numpy --> helps to analyze large matrices and multidimensional arrays.
Data scraping is used to extract data from websites. In our case we extracted data from www.transfermarkt.it. This is a German website that contains information such as rankings, results, transfers, players' careers and football club data.
Transfers of players from the 1990/1991 football season to the 2020/2021 football season have been downloaded. For Italy, data from Serie B from 2010/2021 were also downloaded.
#Code scraping
#Although the datasets are provided in the folder containing this script, it is appropriate to show the code used to extract the data.
#import os
# import requests
# from time import sleep
#
# from bs4 import BeautifulSoup
# import pandas as pd
#
#
# def get_clubs_and_transfers(league_name, league_id, season_id, window):
# """Requests the Transfermarkt page for the input league season and scrapes the page HTML for transfer data.
#
# Args:
# league_name (str): Name of the league.
# league_id (str): League's unique Transfermarkt ID.
# season_id (str): First calendar year of the season, e.g. '2018' for 2018-19.
# window (str): 's' for summer or 'w' for winter transfer windows.
# Returns:
# A list of the clubs in the league, and two lists of tables (list of lists) for each club's transfer activity.
# """
# headers = {
# 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
# base = "https://www.transfermarkt.it"
# url = base + "/{league_name}/transfers/wettbewerb/{league_id}/plus/?saison_id={season_id}&s_w={window}".format(
# league_name=league_name, league_id=league_id, season_id=season_id, window=window)
# try:
# print("Connecting...")
# response = requests.get(url, headers=headers)
# print("Connection successful, status code {}".format(response.status_code))
# except requests.exceptions.RequestException as e:
# print(e)
# exit()
# soup = BeautifulSoup(response.content, 'lxml')
#
# clubs = [tag.text for tag in soup.find_all('div', {'class': 'table-header'})][1:]
#
# tables = [tag.findChild() for tag in soup.find_all('div', {'class': 'responsive-table'})]
# table_in_list = tables[::2]
# table_out_list = tables[1::2]
#
# transfer_in_list = []
# transfer_out_list = []
# column_headers = {'season': season_id, 'window': window, 'league': league_name}
# for table_in, table_out in zip(table_in_list, table_out_list):
# transfer_in_list.append(get_transfer_info(base, table_in, movement='In', **column_headers))
# transfer_out_list.append(get_transfer_info(base, table_out, movement='Out', **column_headers))
#
# return clubs, transfer_in_list, transfer_out_list
#
#
# def get_transfer_info(url_base, table, movement, season, window, league):
# """Helper function to parse an HTML table and extract all desired player information.
#
# Args:
# url_base (str): Transfermark URL for profile link prepending.
# table (bs4.element.Tag): BeautifulSoup HTML table.
# movement (str): 'In' for arrival or 'Out' departure.
# season (str): Season.
# window (str): 's' for summer or 'w' for winter.
# league (str): League name.
# Returns:
# The input table's information reformatted as a list of lists.
# """
# transfer_info = []
# trs = table.find_all('tr')
# header_row = [header.get_text(strip=True) for header in trs[0].find_all('th')]
# if header_row:
# header_row[0] = 'Name'
# header_row.insert(0, 'Club')
# header_row[3] = 'Nationality'
# header_row[-3] = 'MarketValue'
# header_row[-2] = 'ClubInvolved'
# header_row.insert(-1, 'CountryInvolved')
# header_row += ['Movement', 'Season', 'Window', 'League', 'Profile']
# transfer_info.append(header_row)
# for tr in trs[1:]:
# row = []
# tds = tr.find_all('td')
# for td in tds:
# child = td.findChild()
# if child and child.get('class'):
# # Player name and profile link
# if child.get('class')[0] == 'di':
# player = child.find('a', href=True)
# row.append([player.get_text(strip=True), url_base + player.get('href')])
# # Player nationality
# elif child.get('class')[0] == 'flaggenrahmen':
# row.append(child.get('alt'))
# # Club dealt to/from
# elif child.get('class')[0] == 'vereinprofil_tooltip':
# row.append(child.findChild().get('alt'))
# else:
# row.append(td.get_text(strip=True))
# # Mark tables of no transfer activity with None for later cleaning
# if "No new arrivals" in row or "No departures" in row:
# transfer_info.append([None] * (len(header_row) - 1))
# else:
# row += [movement, season, window, league]
# row.append(row[0][1])
# row[0] = row[0][0]
# transfer_info.append(row)
#
# return transfer_info
#
#
# def formatted_transfers(clubs, transfers_in, transfers_out):
# """Prepends club names to their transfers.
#
# Args:
# clubs (list): List of clubs.
# transfers_in (list): List of lists.
# transfers_out (list): List of lists.
# Return:
# Updated transfer tables.
# """
# for i in range(len(clubs)):
# club_name = clubs[i]
# for row in transfers_in[i][1:]:
# row.insert(0, club_name)
# for row in transfers_out[i][1:]:
# row.insert(0, club_name)
#
# return transfers_in, transfers_out
#
#
# def transfers_dataframe(tables_list):
# """Converts all transfer tables to dataframes then concatenates them into a single dataframe.
#
# Args:
# tables_list (list): List of transfer DataFrames.
# Returns:
# A DataFrame of all transfers.
# """
# return pd.concat([pd.DataFrame(table[1:], columns=table[0]) for table in tables_list])
#
#
# def export_csv(df, season_id, league_name, league_id):
# """Writes an input DataFrame to a csv in its corresponding season's folder.
#
# Args:
# df (DataFrame): Transfer data to be exported.
# season_id (str): Folder in which to write the csv.
# league_name (str): File name for the csv.
# """
# file_name = '{}_{}_{}.csv'.format(league_name,league_id,season_id)
# current_dir = os.path.dirname(__file__)
# path_name = os.path.join(current_dir, '../{}'.format(season_id))
# if not os.path.exists(path_name):
# os.mkdir(path_name)
#
# export_name = os.path.join(path_name, file_name)
# df.to_csv(export_name, index=False, encoding='utf-8')
#
#
# def scrape_season_transfers(league_name, league_id, season_id, window):
# """Web scrapes Transfermarkt for all transfer activity in a league's given window.
#
# Args:
# league_name (str): Name of the league.
# league_id (str): League's unique Transfermarkt ID.
# season_id (str): First calendar year of the season, e.g. '2018' for 2018-19.
# window (str): 's' for summer or 'w' for winter transfer windows.
# Returns:
# A DataFrame of all season transfer activity in the input league.
# """
# clubs, transfer_in_list, transfer_out_list = get_clubs_and_transfers(league_name, league_id, season_id, window)
# print("Got data for {} {} {} transfer window".format(season_id, league_name.upper(), window.upper()))
# transfers_in, transfers_out = formatted_transfers(clubs, transfer_in_list, transfer_out_list)
# print("Formatted transfers")
# df_in = transfers_dataframe(transfers_in)
# df_out = transfers_dataframe(transfers_out)
# print("Created dataframes")
# print("\n********************************\n")
# return pd.concat([df_in, df_out])
#
#
# def transfers(league_name, league_id, start, stop):
# """Scrape a league's transfers over a range of seasons.
#
# Args:
# league_name (str): Name of the league.
# league_id (str): League's unique Transfermarkt ID.
# start (int): First calendar year of the first season to scrape, e.g. 1992 for the 1992/93 season.
# stop (int): Second calendar year of the last season, e.g. 2019 for the 2019/20 season.
# """
# try:
# for i in range(start, stop + 1):
# league_transfers = []
# season_id = str(i)
# for window in ['e', 'i']:
# league_transfers.append(scrape_season_transfers(league_name, league_id, season_id, window))
# sleep(3)
# df = pd.concat(league_transfers)
# df = df[~df['Name'].isna()]
# df.reset_index(drop=True, inplace=True)
# export_csv(df, season_id, league_name, league_id)
# except TypeError:
# print("Make sure league parameters are STRINGS and years are INTEGERS.")
#
#
# def main():
# # England, Premier League
# # print("Getting Premier League data...\n")
# # transfers('premier-league', 'GB1', 2010, 2020)
# # print("Done with the Premier League!")
# # print("********************************\n")
#
# # Germany, Bundesliga
# # print("Getting Bundesliga data...\n")
# # transfers('1-bundesliga', 'L1', 2010, 2020)
# # print("Done with the Bundesliga!")
# # print("********************************\n")
#
# # Spain, La Liga
# # print("Getting La Liga data...\n")
# # transfers('laliga', 'ES1', 2010, 2020)
# # print("********************************\n")
#
# # Italy, Serie A
# # print("Getting Serie A data...\n")
# # transfers('serie-a', 'IT1', 2010,2020)
# # print("Done with Serie A!")
# # print("********************************\n")
#
# # Italy, Serie B
#
# # print("Getting Serie B data...\n")
# # transfers('serie_b', 'IT2', 2010,2020)
# # print("Done with Serie B!")
# # print("********************************\n")
#
# # Italy, Serie C
# lista = ['IT3C']
# for i in lista:
# print("Getting Serie C data...\n")
# transfers('serie_c', i, 2014,2020)
# print("Done with Serie C!")
# print("********************************\n")
#
# # France, Ligue 1
# # print("Getting Ligue 1 data...\n")
# # transfers('ligue-1', 'FR1', 2010, 2020)
# # print("Done with Ligue 1!")
# # print("********************************\n")
#
# # Portugal
# # print("Getting Liga Nos data...\n")
# # transfers('liga-nos', 'PO1', 2010, 2020)
# # print("Done with Liga Nos!")
# # print("********************************\n")
#
#
# print("\nDone!")
#
#
# if __name__ == "__main__":
# main()
The data cleansing process can be found throughout the code. The most important changes concerned:
The datasets provided to operate this notebook are the union of several datasets extracted with the procedures described above.
The analysis has been divided into 2 parts
It is important to remember that the football market is the set of contractual negotiations that define the transfer of a player from one club to another. Clubs can only carry out transfer operations in two windows: one during the summer and the other in January. (However, no distinction is made in this draft).
Acting as an advisor to a football agent or sporting director of a football club, it was decided to look back over the last three decades of the football market in the six leagues mentioned above, in order to provide a history and observe the most important teams (nodes) and transfers (arcs).
Three different datasets were examined, one per decade. It was decided to print the graph of all football market transactions on the screen.
#loading python libraries
import pandas as pd
import networkx as nx
import networkx.algorithms.community as nxcom
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib.image as mpimg
#import matplotlib.use('TkAgg')
from matplotlib import colors, cm
from matplotlib.collections import LineCollection
import plotly.express as px
import plotly.graph_objects as go
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
import numpy as np
from numpy import sqrt
import operator
import seaborn as sns
#import community
#from fa2 import ForceAtlas2
#import bezier
import itertools
from collections import Counter
import glob
import PIL
import random
from networkx.algorithms.shortest_paths.weighted import single_source_dijkstra
lista = ['Euro 1990_99','Euro 2000_09','Euro 2010_21']
for i in lista:
url = "Datasets/" + str(i)+".csv"
frame = pd.read_csv(url, error_bad_lines=False, sep=',')
frame = frame[frame.Movement == "Out"]
frame = frame.replace('FC Internazionale','Inter').replace('Parma Calcio 1913','Parma FC')
# lista_unica = list(frame.Club.unique())
# frame = frame[frame['ClubInvolved'].isin(lista_unica)]
frame['new_col'] = frame["Club"] +","+ frame["ClubInvolved"]
# frame = frame[frame.Club != "Calcio Catania"]
# a = pd.DataFrame(frame['new_col'].value_counts()).reset_index()
# a.columns = ['squadre', 'counts']
# b = a[a['counts'] > 20]
def get_edges(data, column):
series = data[column].dropna().apply(lambda x: x.split(","))
cross = series.apply(lambda x: list(itertools.combinations(x, 2)))
lists = [item for sublist in cross for item in sublist]
source = [i[0] for i in lists]
target = [i[1] for i in lists]
edges = pd.DataFrame({"source": source, "target": target})
edges["weight"] = 1
#return edges
return edges.groupby(by=["source", "target"], as_index=False)["weight"].sum()
df_edges_old = get_edges(data=frame, column="new_col")
df = df_edges_old[df_edges_old['weight'] > 0]
#df = df_edges[~df_edges[['source', 'target']].apply(frozenset, axis=1).duplicated()]
g = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["weight"],create_using=nx.Graph)
print(nx.info(g))
a = Counter(df.target)
#a = dict(g.degree)
for char in a:
if a[char] > 0:
g.add_node(char, size = a[char])
d = area_dict = dict(zip(frame.Club, frame.League))
for lg in d:
if d[lg] == lg:
g.add_node(lg, league = d[lg])
def make_edge(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='white'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge2(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='#ffb81d'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge3(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='#a61c31'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge25(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='#e35700'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
#pos = nx.kamada_kawai_layout(g)
pos = nx.spring_layout(g)
#pos = nx.circular_layout(g)
#pos = nx.spiral_layout(g)
#pos = nx.multipartite_layout(g)
#
# Position nodes in layers of straight lines.
# For each edge, make an edge_trace, append to list
edge_trace = []
for edge in g.edges():
# if g.edges()[edge]['weight'] > 1:
if g.edges()[edge]['weight'] < 10:
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge([x0, x1, None], [y0, y1, None],text,
g.edges()[edge]['weight']/70)
edge_trace.append(trace)
elif g.edges()[edge]['weight'] < 15:
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge25([x0, x1, None], [y0, y1, None],text,
g.edges()[edge]['weight']/60)
edge_trace.append(trace)
elif g.edges()[edge]['weight'] < 24:
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge2([x0, x1, None], [y0, y1, None], text,
g.edges()[edge]['weight'] / 35)
edge_trace.append(trace)
elif g.edges()[edge]['weight'] >= 24:
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge3([x0, x1, None], [y0, y1, None], text,
g.edges()[edge]['weight']/20)
edge_trace.append(trace)
node_trace = go.Scatter(x=[],y=[],
mode='markers',
hoverinfo='text',
text= [],
marker=dict(
showscale=True,
# colorscale options
#'Greys' | 'YlGnBu' | 'Greens' | 'YlOrRd' | 'Bluered' | 'RdBu' |
#'Reds' | 'Blues' | 'Picnic' | 'Rainbow' | 'Portland' | 'Jet' |
#'Hot' | 'Blackbody' | 'Earth' | 'Electric' | 'Viridis' |'YlGnBu'
colorscale='Hot',
reversescale=True,
color=[],
size=[],
colorbar=dict(
thickness=15,
title='Degree Centrality',
xanchor='left',
titleside='right'
),
line_width=0.5))
all_weights = []
# 4 a. Iterate through the graph nodes to gather all the weights
for (node1, node2, data) in g.edges(data=True):
all_weights.append(data['weight'])
for node in g.nodes():
x, y = pos[node]
node_trace['x'] += tuple([x])
node_trace['y'] += tuple([y])
clubs = list(df.source.unique())
if g.nodes()[node]['size'] <30:
node_trace['marker']['size']+= tuple([g.nodes()[node]['size']*10/ sum(all_weights)*2])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
elif g.nodes()[node]['size'] < 50:
node_trace['marker']['size'] += tuple([g.nodes()[node]['size'] * len(clubs)*13/ sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
elif g.nodes()[node]['size'] >= 50:
node_trace['marker']['size'] += tuple([g.nodes()[node]['size'] * len(clubs)*18/ sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
node_adjacencies = []
node_text = []
node_betweenness = []
for node, adjacencies in enumerate(g.adjacency()):
node_adjacencies.append(len(adjacencies[1]))
node_text.append('# of connections: '+ str(len(adjacencies[1])))
for value in list(nx.betweenness_centrality(g).values()):
node_betweenness.append(value)
node_text.append('Betweenness: ')
node_trace.marker.color = node_adjacencies
#node_trace.marker.color = node_betweenness
fig = go.Figure(
layout=go.Layout(
title='Network Graph Football '+ str(i),
titlefont_size=20,
paper_bgcolor='rgba(0,0,0,0)', # transparent background
plot_bgcolor='black', # transparent 2nd background
showlegend=False,
hovermode='closest',
margin=dict(b=20,l=5,r=5,t=40),
xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
yaxis=dict(showgrid=False, zeroline=False, showticklabels=False))
)
# Add all edge traces
for trace in edge_trace:
fig.add_trace(trace)
# Add node trace
fig.add_trace(node_trace)
fig.show()
#fig.write_html('teams'+ str(i)+'.html')
# import dash
# import dash_core_components as dcc
# import dash_html_components as html
#
# app = dash.Dash()
# app.layout = html.Div([
# dcc.Graph(figure=fig)
# ])
Name: Type: Graph Number of nodes: 272 Number of edges: 3220 Average degree: 23.6765
Name: Type: Graph Number of nodes: 281 Number of edges: 4518 Average degree: 32.1566
Name: Type: Graph Number of nodes: 268 Number of edges: 5410 Average degree: 40.3731
In the graph '90-'99 we note in order how Benfica, Inter, Olympique Marseille and Torino Calcio were the teams with the most transfers from 1990 to 1999. We note that teams from the German, English and Spanish leagues do not yet have any major nodes. Also in this first graph we can see an important arc between Bayer 04 Leverkusen and Fortuna Düsseldorf. We notice how in this period there is an important division between the various leagues, almost creating a cluster. An important arc between leagues is that between Juventus and Borussia Dortmund.
In the graph 2000-2009 we notice in order how Inter, Olympique Marseille, Liverpool and Benfica are the teams with the most transfers. We notice how the teams connect more with teams in the same league, but compared to the previous decade we notice more pronounced arcs.
In the last decade analysed we notice how Serie A and La Liga become very active. However, we notice that teams such as Porto, Lisbon and Benfica, belonging to the LIGA NOS, become very important regarding Degree Centrality. It is interesting to observe how it is clearly visible the connection of teams belonging to different leagues but belonging to the same family, in this case the Pozzo family with Udinese, Granada and Watford.
We have got the white color for edges until 9 links, yellow from 10 to 14, orange from 15 to 13 and red from 24 to max. Node size based on number of transfers, node color based on degree.
df = pd.read_csv("Datasets/leghe_euro_9020.csv",error_bad_lines=False, sep=',')
df["CountryInvolved"].replace({"Italia": "Serie A", "Portogallo": "Liga Nos",
"Inghilterra": "Premier League", "Spagna": "La Liga",
"Francia": "Ligue 1", "Germania": "Bundesliga"}, inplace=True)
df["League"].replace({"serie-a": "Serie A", "liga-nos": "Liga Nos",
"premier-league": "Premier League", "laliga": "La Liga",
"ligue-1": "Ligue 1", "1-bundesliga": "Bundesliga",
"first-division-bis-91-92-": "Premier League"}, inplace=True)
df = df[df.Movement == "Out"]
lista_unica = ["Serie A","Liga Nos","Premier League","La Liga","Ligue 1","Bundesliga"]
df = df[df["CountryInvolved"].isin(lista_unica)]
df['Leghe'] = df["CountryInvolved"] +","+ df["League"]
df = df.replace((list(range(1990, 2000))), '1990-99')\
.replace((list(range(2000, 2010))), '2000-09').replace((list(range(2010, 2021))), '2010-20')
anni = ['1990-99', '2000-09', '2010-20']
# for f in df.columns:
# print(df[f].value_counts())
# print('***********************************')
for anno in anni:
frame = df[df.Season == anno]
def get_edges(data, column):
series = data[column].dropna().apply(lambda x: x.split(","))
cross = series.apply(lambda x: list(itertools.combinations(x, 2)))
lists = [item for sublist in cross for item in sublist]
source = [i[0] for i in lists]
target = [i[1] for i in lists]
edges = pd.DataFrame({"source": source, "target": target})
edges["weight"] = 1
#return edges
return edges.groupby(by=["source", "target"], as_index=False)["weight"].sum()
df_edges_old = get_edges(data=frame, column="Leghe")
df_edges = df_edges_old[df_edges_old['weight'] > 0]
g = nx.from_pandas_edgelist(df_edges, edge_attr=["weight"],create_using=nx.Graph)
layout = nx.spring_layout(g)
frame = df_edges
clubs = list(frame.source.unique())
clubsinv = list(frame.target.unique())
dict(zip(clubs, clubsinv))
fig = plt.figure(figsize=(15, 10),facecolor='black')
#fig.set_facecolor("#00000F")
# 1. Create the graph
# 2. Create a layout for our nodes
#pos = nx.kamada_kawai_layout(g)
#pos = nx.spring_layout(g)
#layout = nx.circular_layout(g)
#pos = nx.multipartite_layout(g)
# # 3. Draw the parts we want
club_size = [g.degree(club) for club in clubs]
nx.draw_networkx_nodes(g,
layout,
nodelist=clubs,
node_size=club_size, # a LIST of sizes, based on g.degree
node_color=club_size)
# Draw EVERYONE
#nx.draw_networkx_nodes(g, layout, nodelist=clubs, node_color='lightblue', node_size=200)
# Draw POPULAR clubs
popular_clubsinv = [clubinv for clubinv in clubsinv if g.degree(clubinv) > 20]
#nx.draw_networkx_nodes(g, layout, nodelist=popular_clubsinv, node_color='#F4D03F', node_size=300)
edges,weights = zip(*nx.get_edge_attributes(g,'weight').items())
#nx.draw(g, pos, node_color='b', edgelist=edges, edge_color=weights, width=1.0, edge_cmap=plt.cm.Blues)
all_weights = []
# 4 a. Iterate through the graph nodes to gather all the weights
for (node1, node2, data) in g.edges(data=True):
all_weights.append(data['weight']) # we'll use this when determining edge thickness
# 4 b. Get unique weights
unique_weights = list(set(all_weights))
# 4 c. Plot the edges - one by one!
for weight in unique_weights:
# 4 d. Form a filtered list with just the weight you want to draw
weighted_edges = [(node1, node2) for (node1, node2, edge_attr) in g.edges(data=True) if
edge_attr['weight'] == weight]
width = weight * len(clubs) * 20/ sum(all_weights)
print(weight)
if weight > 200:
nx.draw_networkx_edges(g, layout, edgelist=weighted_edges,
width=width, edge_color= "red")
elif weight > 100:
nx.draw_networkx_edges(g, layout, edgelist=weighted_edges,
width=width, edge_color= "orange")
else:
nx.draw_networkx_edges(g, layout, edgelist=weighted_edges,
width=width, edge_color= "yellow")
# nx.draw_networkx_edges(g, layout, width=weighted_edges, alpha=0.5,
# edge_color=[g[u][v]['Costo'] for u,v in g.edges])
node_labels = dict(zip(clubs,clubsinv))
nx.draw_networkx_labels(g, layout, font_size=16, font_color = 'white')
# 4. Turn off the axis because I know you don't want it
plt.axis('off')
plt.title("Leghe Europee " + (str(anno)))
# 5. Tell matplotlib to show it
plt.show()
1 903 14 2062 16 18 20 24 25 1050 27 31 34 35 1208 60 66 1895 1133
1667 11 1683 20 3863 1178 31 39 43 2993 63 64 66 72 1999 80 89 218 105 121
2568 136 143 2961 146 7193 35 293 187 191 65 71 95 96 2019 102 107 1650 250 1659 255
-Looking at the graph '90-'99 we notice that the LIGA NOS and the BUNDESLIGA do not have an arc connecting them. This shows that there have been no changes between these leagues.
-Looking at the graph from 2000-2009 we can see that the LIGA NOS and the BUNDESLIGA in this case start to be connected but in a very slight way. The connection between Ligue 1 and Liga intensifies with the Premier League becoming a central node. This can also be seen in the positioning within the graph.
-In the last graph, period 2009-2021, we notice that in the last decade the number of transfers has increased considerably. We can see this by the redder colour of the links and by the thicker stroke.
We have got the yellow color for edges until 100 links, orange from 101 to 200, red from 201 to max.
In graph theory, betweenness centrality (or "betweeness centrality") is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex. For example, in a telecommunications network, a node with higher betweenness centrality would have more control over the network, because more information will pass through that node
We wanted to use it to understand which teams were the most present among the exchanges and therefore those playing a central role.
url = "Datasets/leghe_euro_9020.csv"
frame = pd.read_csv(url, error_bad_lines=False, sep=',')
frame = frame[frame.Movement == "Out"]
frame['new_col'] = frame["Club"] +","+ frame["ClubInvolved"]
frame = frame.replace((list(range(1990, 2000))), '1990-99')\
.replace((list(range(2000, 2010))), '2000-09').replace((list(range(2010, 2021))), '2010-20')
anni = ['1990-99', '2000-09', '2010-20']
def annata(i):
df = frame[frame['Season'] == i]
return df
def get_edges(data, column):
series = data[column].dropna().apply(lambda x: x.split(","))
cross = series.apply(lambda x: list(itertools.combinations(x, 2)))
lists = [item for sublist in cross for item in sublist]
source = [i[0] for i in lists]
target = [i[1] for i in lists]
edges = pd.DataFrame({"source": source, "target": target})
edges["weight"] = 1
#return edges
return edges.groupby(by=["source", "target"], as_index=False)["weight"].sum()
def plot_year(metric,title):
def a(num):
for x in num:
yield x
anni = ['1990-99', '2000-09', '2010-20']
n2=(0,1,2)
x1=a(anni)
x2=a(n2)
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,10))
fig.suptitle(title, fontsize=16)
sns.set(style="dark")
for i,j in zip(x1,x2):
df_plot = get_edges(data=annata(i), column="new_col")
df_plot = df_plot[df_plot['weight'] > 0]
g = nx.from_pandas_edgelist(df_plot, source="source", target="target", edge_attr=["weight"],create_using=nx.Graph)
#La centralità Eigenvector calcola la centralità di un nodo basato sulla centralità dei suoi vicini.
eig=sorted(metric(g).items(),key=operator.itemgetter(1), reverse=True )
eig=pd.DataFrame(eig, columns=['teams', 'value']).head(3)
ax[j].set_title(i ,size = 5, color = 'Black')
ax[j].tick_params(labelsize=5)
#fig.patch.set_visible(False)
sns.barplot(x = 'teams',y="value", data=eig, palette='YlOrRd', ax=ax[j])
plt.setp(ax[j], ylabel=(''), xlabel=(''))
plt.tight_layout()
# labelsize = 1
# rcParams['xtick.labelsize'] = labelsize
# rcParams['ytick.labelsize'] = labelsize
plt.show()
def plot_total(metric):
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15,10))
sns.set(style="dark")
df_plot = get_edges(data=frame, column="new_col")
df_plot = df_plot[df_plot['weight'] > 0]
g = nx.from_pandas_edgelist(df_plot, source="source", target="target", edge_attr=["weight"], create_using=nx.Graph)
# La centralità Eigenvector calcola la centralità di un nodo basato sulla centralità dei suoi vicini.
eig = sorted(metric(g).items(), key=operator.itemgetter(1), reverse=True)
eig = pd.DataFrame(eig, columns=['teams', 'value']).head(4)
ax.set_title(metric, size=10, color='Black')
ax.tick_params(labelsize=7)
# fig.patch.set_visible(False)
sns.barplot(x='teams', y="value", data=eig, palette='YlOrRd', ax=ax)
plt.setp(ax, ylabel=(''), xlabel=(''))
plt.show()
if __name__ == '__main__':
#short(Shortest)
#plot_year(nx.closeness_centrality, title= 'Closeness Centrality')
#plot_year(nx.pagerank, title= 'Page Rank')
plot_year(nx.betweenness_centrality, title= 'Betweenness Centrality')
#plot_year(nx.edge_betweenness_centrality, title= 'Edge Betweenness Centrality')
#plot_year(nx.eigenvector_centrality, title= 'Eigen Vector Centrality')
#plot_year(nx.degree_centrality, title= 'Degree Centrality')
#plot_total(nx.algorithms.transitivity)
#plot_year(nx.algorithms.katz_centrality_numpy, title= 'Katz Centrality')
#plot_year(nx.algorithms.shortest_path, title= 'Katz Centrality')
-In the period 1990-1999 we note that two teams from the LIGA NOS Benfica and Lisbon are the ones with the highest betweenness value followed by Eintracht Frankfurt.
-In the decade 2000-2009 we observe that Benfica and Lisbon remain in first and second place, followed by Udinese.
-In the last decade analyzed, 2010-2019 we note that there is a clear change, in fact Genoa and Parma become teams with a higher degree of betweenness.
Acting as an advisor to a football agent or a sports director of a football club, we decided to look at the last decade 2010-2021 of the football market of the 6 leagues mentioned above. This time, however, we will focus on an analysis of monetary transfers.
df = pd.read_csv("Datasets/leghe_euro_values.csv",error_bad_lines=False, sep=',')
df = df[df.Movement == "Out"]
# df['Costo'] = df['Costo'].str.replace(' mila €','000').str.replace(' mln €','0000').str.replace(r'\D', '').replace(r'^\s*$', np.nan, regex=True)
#
# df = df.replace('FC Internazionale','Inter').replace('Parma Calcio 1913','Parma FC')
# #
# #27038 values
#
# df.dropna(subset = ["Costo"], inplace=True)
# df['Costo'] = df['Costo'].astype(int)
# df.to_csv(r'Datasets Serie A/serieavalues.csv', index = False)
#9438 values
df['squadre'] = df["Club"] +","+ df["ClubInvolved"]
df["League"].replace({"serie-a": 0, "liga-nos": 1,
"premier-league": 2, "laliga": 3,
"ligue-1": 4, "1-bundesliga": 5,
"first-division-bis-91-92-": 2}, inplace=True)
# df = df[df.Club != "Real Madrid CF"]
# df = df[df.ClubInvolved != "Real Madrid CF"]
new = pd.DataFrame(zip(df.Club,df.ClubInvolved,df.Costo,df.League))
new.rename(columns={0:'Club', 1: 'ClubInvolved', 2: 'Costo', 3:'League'}, inplace=True)
df_edges = new.groupby(by=['Club', 'ClubInvolved', 'League'], as_index=False)['Costo'].sum()
#df = df_edges[~df_edges[['Club', 'ClubInvolved']].apply(frozenset, axis=1).duplicated()]
df_edges = df_edges[df_edges.Costo > 70000000]
# lista_unica = list(df_edges.Club.unique())
# df_edges = df_edges[df_edges['ClubInvolved'].isin(lista_unica)]
g = nx.from_pandas_edgelist(df_edges, source="Club", target="ClubInvolved", edge_attr=["Costo"],create_using=nx.Graph)
d = dict(zip(df_edges.Club, df_edges.League))
for lg in d:
g.add_node(lg, league = d[lg])
df = df_edges
clubs = list(df.Club.unique())
clubsinv = list(df.ClubInvolved.unique())
plt.figure(figsize=(40, 30))
# 1. Create the graph
# 2. Create a layout for our nodes
layout = nx.spring_layout(g)
#
# # 3. Draw the parts we want
club_size = [g.degree(club)*30 for club in clubs]
k = list(d.keys())
v = list(d.values())
nx.draw_networkx_nodes(g,
layout,
nodelist=clubs,
node_size=500, # a LIST of sizes, based on g.degree
node_color=v)
# Draw EVERYONE
#nx.draw_networkx_nodes(g, layout, nodelist=clubs, node_color='lightblue', node_size=200)
# Draw POPULAR clubs
popular_clubsinv = [clubinv for clubinv in clubsinv if g.degree(clubinv) > 20]
#nx.draw_networkx_nodes(g, layout, nodelist=popular_clubsinv, node_color='#F4D03F', node_size=300)
# Set Edge Color based on weight
# values = range(1838) #this is based on the number of edges in the graph, use print len(g.edges()) to determine this
# jet = plt.get_cmap('YlOrRd')
# cNorm = colors.Normalize(vmin=0, vmax=values[-1])
# scalarMap = cm.ScalarMappable(norm=cNorm, cmap=jet)
# colorList = []
#
#
# for i in range(1838):
# colorVal = scalarMap.to_rgba(values[i])
# colorList.append(colorVal)
for u,v,d in g.edges(data=True):
d['weight'] = random.random()
edges,weights = zip(*nx.get_edge_attributes(g,'weight').items())
pos = nx.spring_layout(g)
all_weights = []
# 4 a. Iterate through the graph nodes to gather all the weights
for (node1, node2, data) in g.edges(data=True):
all_weights.append(data['Costo']) # we'll use this when determining edge thickness
# 4 b. Get unique weights
unique_weights = list(set(all_weights))
# 4 c. Plot the edges - one by one!
for weight in unique_weights:
# 4 d. Form a filtered list with just the weight you want to draw
weighted_edges = [(node1, node2) for (node1, node2, edge_attr) in g.edges(data=True) if
edge_attr['Costo'] == weight]
width = weight * len(clubs)*5 / sum(all_weights)
print(weight)
if weight < 150000000:
nx.draw_networkx_edges(g, layout, edgelist=weighted_edges,
width=width, edge_color= "lightblue")
elif weight < 200000000:
nx.draw_networkx_edges(g, layout, edgelist=weighted_edges,
width=width, edge_color="b")
else:
nx.draw_networkx_edges(g, layout, edgelist=weighted_edges,
width=width, edge_color="black")
# nx.draw_networkx_edges(g, layout, width=weighted_edges, alpha=0.5,
# edge_color=[g[u][v]['Costo'] for u,v in g.edges])
node_labels = dict(zip(clubs,popular_clubsinv))
nx.draw_networkx_labels(g, layout, font_size=20)
# 4. Turn off the axis because I know you don't want it
plt.axis('off')
plt.title("Network Transfermarkt Prize Money", fontsize=30)
# 5. Tell matplotlib to show it
plt.show()
75302016 154000000 80000000 107200000 136000000 236720000 78706051 72000000 105600000 102000000 135302019 87022015 79312014 128556049 70272019 81286041 168502017 81802015 94500000 83700000 72500000 75700000 78302017 205652019 174804027 99902015 81000000 175000000 130200000 100906049 128200000 113000000 137000000 117800000 94070086 105000000 72302019 82000000 118000000 97156047 90000000 76002012 107500000 225500000 71224032 123900000 117100000 86700000 89430000 79750000 111302013 75302015
In this graph we can see how Juventus is a key node for the exchange of money between Italy and Europe. One of the central nodes is definitely Barcelona who are among the teams that spin the most money, with PSG and Liverpool being linked with FC Southampton, which is the team where they drew the most, as players, brought the UEFA Champions League in 2019. We also note that nodes/teams Chelsea, Barcelona, Juventus and Manchester United are the ones with the highest degree.
We have got the light-blue color for edges until 150 mln exchanged, blue from 150 mln to 200 mln and black color from 200 mln to max.
df = pd.read_csv("Datasets/leghe_euro_values.csv",error_bad_lines=False, sep=',')
df = df[df.Movement == "Out"]
# df['Costo'] = df['Costo'].str.replace(' mila €','000').str.replace(' mln €','0000').str.replace(r'\D', '').replace(r'^\s*$', np.nan, regex=True)
#
# df = df.replace('FC Internazionale','Inter').replace('Parma Calcio 1913','Parma FC')
# #
# #27038 values
#
# df.dropna(subset = ["Costo"], inplace=True)
# df['Costo'] = df['Costo'].astype(int)
# df.to_csv(r'Datasets Serie A/serieavalues.csv', index = False)
#9438 values
df['squadre'] = df["Club"] +","+ df["ClubInvolved"]
df["League"].replace({"serie-a": 0, "liga-nos": 1,
"premier-league": 2, "laliga": 3,
"ligue-1": 4, "1-bundesliga": 5,
"first-division-bis-91-92-": 2}, inplace=True)
# df = df[df.Club != "FC Barcellona"]
# df = df[df.ClubInvolved != "FC Barcellona"]
new = pd.DataFrame(zip(df.Club,df.ClubInvolved,df.Costo,df.League))
new.rename(columns={0:'Club', 1: 'ClubInvolved', 2: 'Costo', 3:'League'}, inplace=True)
df_edges = new.groupby(by=['Club', 'ClubInvolved', 'League'], as_index=False)['Costo'].sum()
#df = df_edges[~df_edges[['Club', 'ClubInvolved']].apply(frozenset, axis=1).duplicated()]
df_edges = df_edges[df_edges.Costo > 70000000]
#lista_unica = list(df_edges.Club.unique())
#df_edges = df_edges[df_edges['ClubInvolved'].isin(lista_unica)]
g = nx.from_pandas_edgelist(df_edges, source="Club", target="ClubInvolved", edge_attr=["Costo"],create_using=nx.Graph)
def plot_total(metric,title):
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15,10))
fig.suptitle(title, fontsize=20)
sns.set(style="dark")
# La centralità Eigenvector calcola la centralità di un nodo basato sulla centralità dei suoi vicini.
eig = sorted(metric(g).items(), key=operator.itemgetter(1), reverse=True)
eig = pd.DataFrame(eig, columns=['teams', 'value']).head(4)
ax.set_title(metric, size=0, color='Black')
ax.tick_params(labelsize=15)
# fig.patch.set_visible(False)
sns.barplot(x='teams', y="value", data=eig, palette='YlOrRd', ax=ax)
plt.setp(ax, ylabel=(''), xlabel=(''))
plt.show()
if __name__ == '__main__':
#short(Shortest)
#plot_year(nx.closeness_centrality, title= 'Closeness Centrality')
#plot_total(nx.pagerank, title= 'Page Rank')
plot_total(nx.betweenness_centrality, title= 'Betweenness Centrality')
#plot_year(nx.edge_betweenness_centrality, title= 'Edge Betweenness Centrality')
plot_total(nx.eigenvector_centrality, title= 'Eigen Vector Centrality')
#plot_year(nx.degree_centrality, title= 'Degree Centrality')
#plot_total(nx.algorithms.transitivity)
#plot_year(nx.algorithms.katz_centrality_numpy, title= 'Katz Centrality')
#plot_year(nx.algorithms.shortest_path, title= 'Katz Centrality')
A measure of centrality in a graph based on shortest paths. In the last 10 years Juventus, based on the monetary amount exchanged had the highest betweenness with 0.30,it is evident from the graph that Juventus is the team that moves more money between Italy and Europe. Then we have Barcelona, Manchester United and Chelsea.
Always acting as an advisor to a football agent or a sporting director of a football club, the idea was to observe the movements of the football market by combining position and age of the player, to see how the flow between the leagues varied.
url = "Datasets/leghe_euro_1020_INOUT.csv"
frame = pd.read_csv(url, error_bad_lines=False, sep=',')
frame["CountryInvolved"].replace({"Italia": "Serie A", "Portogallo": "Liga Nos",
"Inghilterra": "Premier League", "Spagna": "La Liga",
"Francia": "Ligue 1", "Germania": "Bundesliga"}, inplace=True)
frame["League"].replace({"serie-a": "Serie A", "liga-nos": "Liga Nos",
"premier-league": "Premier League", "laliga": "La Liga",
"ligue-1": "Ligue 1", "1-bundesliga": "Bundesliga",
"first-division-bis-91-92-": "Premier League"}, inplace=True)
IN = frame[frame.Movement == "In"]
IN.name = 'to league'
OUT = frame[frame.Movement == "Out"]
OUT.name = 'from league'
def ruoli_eta(ds):
ruoli = ds.replace(['DC','TS','TD','Difesa'],'D').replace(['P','AD','AS','SP','Attacco'],'A').replace(['M','CC','TQ','Centrocampo','CD','CS'],'C')
ruoli = ruoli.replace(['POR'],'P')
ruoli = ruoli[ruoli.Età != '-']
ruoli = ruoli[ruoli.Età != '115']
ruoli = ruoli[ruoli.Età != '-1776']
ruoli['Pos'].value_counts()
ruoli['Età'].value_counts()
ruoli = ruoli.replace(list(map(str,range(13,18))),'13-17').replace(list(map(str,range(18,21))),'18-20')\
.replace(list(map(str,range(21,23))),'21-22').replace(list(map(str,range(23,26))),'23-25')\
.replace(list(map(str,range(26,29))),'26-28').replace(list(map(str,range(29,32))),'29-31')\
.replace(list(map(str,range(32,36))),'32-35').replace(list(map(str,range(36,116))),'36-42')
ruoli['ruoloeta'] = ruoli["Pos"] +"|"+ ruoli["Età"]
ruoli['lega_player'] = ruoli["ruoloeta"] +","+ ruoli["League"]
return ruoli
def get_edges(data, column):
series = data[column].dropna().apply(lambda x: x.split(","))
cross = series.apply(lambda x: list(itertools.combinations(x, 2)))
lists = [item for sublist in cross for item in sublist]
source = [i[0] for i in lists]
target = [i[1] for i in lists]
edges = pd.DataFrame({"source": source, "target": target})
edges["weight"] = 1
#return edges
return edges.groupby(by=["source", "target"], as_index=False)["weight"].sum()
df_edges_old = get_edges(data=ruoli_eta(IN), column="lega_player")
df = df_edges_old[df_edges_old['weight'] > 0]
idx = df.groupby(['source'])['weight'].transform(max) == df['weight']
df = df[idx]
#df = df_edges[~df_edges[['source', 'target']].apply(frozenset, axis=1).duplicated()]
g = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["weight"],create_using=nx.Graph)
a = dict(g.degree)
for char in a:
if a[char] > 0:
g.add_node(char, size = a[char])
def make_edge(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='black'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge2(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='red'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge3(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='orange'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
#pos = nx.kamada_kawai_layout(g)
#pos = nx.spring_layout(g)
pos = nx.circular_layout(g)
#pos =nx.fruchterman_reingold_layout(g)
#pos = nx.spiral_layout(g)
#pos = nx.multipartite_layout(g)
#
# Position nodes in layers of straight lines.
# For each edge, make an edge_trace, append to list
edge_trace = []
for edge in g.edges():
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge2([x0, x1, None], [y0, y1, None], text,
g.edges()[edge]['weight']/200)
edge_trace.append(trace)
#
# if g.edges()[edge]['weight'] > 500:
# char_1 = edge[0]
# char_2 = edge[1]
# x0, y0 = pos[char_1]
# x1, y1 = pos[char_2]
# text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
#
# trace = make_edge([x0, x1, None], [y0, y1, None],text,
# g.edges()[edge]['weight'] /100)
#
# edge_trace.append(trace)
#
# elif g.edges()[edge]['weight'] > 200:
# char_1 = edge[0]
# char_2 = edge[1]
# x0, y0 = pos[char_1]
# x1, y1 = pos[char_2]
# text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
#
# trace = make_edge2([x0, x1, None], [y0, y1, None],text,
# g.edges()[edge]['weight']/100)
#
# edge_trace.append(trace)
#
# elif g.edges()[edge]['weight'] > 0:
# char_1 = edge[0]
# char_2 = edge[1]
# x0, y0 = pos[char_1]
# x1, y1 = pos[char_2]
# text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
#
# trace = make_edge3([x0, x1, None], [y0, y1, None], text,
# g.edges()[edge]['weight']/100)
#
# edge_trace.append(trace)
node_trace = go.Scatter(x=[],y=[],
mode='markers+text',
hoverinfo='text',
textfont=dict(size=14),
text= [],
marker=dict(
showscale=False,
# colorscale options
#'Greys' | 'YlGnBu' | 'Greens' | 'YlOrRd' | 'Bluered' | 'RdBu' |
#'Reds' | 'Blues' | 'Picnic' | 'Rainbow' | 'Portland' | 'Jet' |
#'Hot' | 'Blackbody' | 'Earth' | 'Electric' | 'Viridis' |'YlGnBu'
colorscale= 'Reds',
reversescale=False,
color=[],
size=[],
colorbar=dict(
thickness=15,
title='Node Connections',
xanchor='left',
titleside='right'
),
line_width=0.5))
all_weights = []
# 4 a. Iterate through the graph nodes to gather all the weights
for (node1, node2, data) in g.edges(data=True):
all_weights.append(data['weight'])
for node in g.nodes():
x, y = pos[node]
node_trace['x'] += tuple([x])
node_trace['y'] += tuple([y])
clubs = list(df.source.unique())
if g.nodes()[node]['size'] < 10:
node_trace['marker']['size']+= tuple([g.nodes()[node]['size'] * len(clubs) *400/ sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
elif g.nodes()[node]['size'] >= 10:
node_trace['marker']['size'] += tuple([g.nodes()[node]['size'] * len(clubs) *400 / sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
node_adjacencies = []
node_text = []
node_betweenness = []
for node, adjacencies in enumerate(g.adjacency()):
node_adjacencies.append(len(adjacencies[1]))
node_text.append('# of connections: '+ str(len(adjacencies[1])))
for value in list(nx.betweenness_centrality(g).values()):
node_betweenness.append(value * 250)
for node, name in enumerate(g.nodes()):
node_text.append(name)
node_trace.marker.color = node_adjacencies
#node_trace.text = node_text
# layout= Layout(title= "Coauthorship network of scientists working on network theory and experiment"+\
# "<br> Data source: <a href='https://networkdata.ics.uci.edu/data.php?id=11'> [1]</a>",
# font= dict(size=12),
# showlegend=False,
# autosize=False,
# width=width,
# height=height,
# xaxis=layout.XAxis(axis),
# yaxis=layout.YAxis(axis),
# margin=layout.Margin(
# l=40,
# r=40,
# b=85,
# t=100,
# ),
# hovermode='closest',
# annotations=[
# dict(
# showarrow=False,
# text='This igraph.Graph has the Kamada-Kawai layout',
# xref='paper',
# yref='paper',
# x=0,
# y=-0.1,
# xanchor='left',
# yanchor='bottom',
# font=dict(
# size=14
# )
# )
# ]
# )
fig = go.Figure(
layout=go.Layout(
title='Network Analysis Age and Position '+ IN.name +" by transfer's number",
titlefont_size=20,
paper_bgcolor='rgba(0,0,0,0)', # transparent background
plot_bgcolor='white', # transparent 2nd background
showlegend=False,
hovermode='closest',
margin=dict(b=20,l=5,r=5,t=40),
xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
yaxis=dict(showgrid=False, zeroline=False, showticklabels=False))
)
# Add all edge traces
for trace in edge_trace:
fig.add_trace(trace)
# Add node trace
fig.add_trace(node_trace)
fig.show()
fig.write_html('ruolo_eta_IN.html')
Looking at the graph we notice that the Premier League stands out as the league with the highest number of transfers of all years of very young goalkeepers, defenders and midfielders between 13 and 22. Serie A, on the other hand, is very connected with older players. La Liga NOS is the league with the highest number of transfers of middle aged players. The Bundesliga, La Liga and Ligue 1, on the other hand, do not have transfers with the various combinations but do not appear in the graph.
It has been decided to leave only the most important links for each individual combination to simplify the display.
Thickness according to the number of tranfers.
url = "Datasets/leghe_euro_1020_INOUT.csv"
frame = pd.read_csv(url, error_bad_lines=False, sep=',')
frame["CountryInvolved"].replace({"Italia": "Serie A", "Portogallo": "Liga Nos",
"Inghilterra": "Premier League", "Spagna": "La Liga",
"Francia": "Ligue 1", "Germania": "Bundesliga"}, inplace=True)
frame["League"].replace({"serie-a": "Serie A", "liga-nos": "Liga Nos",
"premier-league": "Premier League", "laliga": "La Liga",
"ligue-1": "Ligue 1", "1-bundesliga": "Bundesliga",
"first-division-bis-91-92-": "Premier League"}, inplace=True)
frame['Costo'] = frame['Costo'].str.replace(' mila €','000')\
.str.replace(' mln €','0000').str.replace(r'\D', '').replace(r'^\s*$', np.nan, regex=True)
frame.dropna(subset = ["Costo"], inplace=True)
frame['Costo'] = frame['Costo'].astype(int)
IN = frame[frame.Movement == "In"]
IN.name = 'to league'
def ruoli_eta(ds):
ruoli = ds.replace(['DC','TS','TD','Difesa'],'D').replace(['P','AD','AS','SP','Attacco'],'A').replace(['M','CC','TQ','Centrocampo','CD','CS'],'C')
ruoli = ruoli.replace(['POR'],'P')
ruoli = ruoli[ruoli.Età != '-']
ruoli = ruoli[ruoli.Età != '115']
ruoli = ruoli[ruoli.Età != '-1776']
ruoli['Pos'].value_counts()
ruoli['Età'].value_counts()
ruoli = ruoli.replace(list(map(str,range(13,18))),'13-17').replace(list(map(str,range(18,21))),'18-20')\
.replace(list(map(str,range(21,23))),'21-22').replace(list(map(str,range(23,26))),'23-25')\
.replace(list(map(str,range(26,29))),'26-28').replace(list(map(str,range(29,32))),'29-31')\
.replace(list(map(str,range(32,36))),'32-35').replace(list(map(str,range(36,116))),'36-42')
ruoli['ruoloeta'] = ruoli["Pos"] +"|"+ ruoli["Età"]
ruoli['lega_player'] = ruoli["ruoloeta"] +","+ ruoli["League"]
ruoli.dropna(subset=["Costo"], inplace=True)
return ruoli
def get_edges(data, column):
series = data[column].dropna().apply(lambda x: x.split(","))
cross = series.apply(lambda x: list(itertools.combinations(x, 2)))
lists = [item for sublist in cross for item in sublist]
source = [i[0] for i in lists]
target = [i[1] for i in lists]
edges = pd.DataFrame({"source": source, "target": target})
edges["weight"] = data['Costo']/200000
#return edges
return edges.groupby(by=["source", "target"], as_index=False)["weight"].sum()
df_edges_old = get_edges(data=ruoli_eta(IN), column="lega_player")
df = df_edges_old[df_edges_old['weight'] > 0]
idx = df.groupby(['source'])['weight'].transform(max) == df['weight']
df = df[idx]
#df = df_edges[~df_edges[['source', 'target']].apply(frozenset, axis=1).duplicated()]
g = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["weight"],create_using=nx.Graph)
a = dict(g.degree)
for char in a:
if a[char] > 0:
g.add_node(char, size = a[char])
def make_edge(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='black'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge2(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='red'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge3(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='orange'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
#pos = nx.kamada_kawai_layout(g)
#pos = nx.spring_layout(g)
pos = nx.circular_layout(g)
#pos =nx.fruchterman_reingold_layout(g)
#pos = nx.spiral_layout(g)
#pos = nx.multipartite_layout(g)
#
# Position nodes in layers of straight lines.
# For each edge, make an edge_trace, append to list
edge_trace = []
for edge in g.edges():
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge2([x0, x1, None], [y0, y1, None], text,
g.edges()[edge]['weight']/300)
edge_trace.append(trace)
node_trace = go.Scatter(x=[],y=[],
mode='markers+text',
hoverinfo='text',
textfont=dict(size=14),
fillcolor='yellow',
text= [],
marker=dict(
showscale=False,
# colorscale options
#'Greys' | 'YlGnBu' | 'Greens' | 'YlOrRd' | 'Bluered' | 'RdBu' |
#'Reds' | 'Blues' | 'Picnic' | 'Rainbow' | 'Portland' | 'Jet' |
#'Hot' | 'Blackbody' | 'Earth' | 'Electric' | 'Viridis' |'YlGnBu'
colorscale= 'Reds',
reversescale=False,
color=[],
size=[],
line_width=0.5))
all_weights = []
# 4 a. Iterate through the graph nodes to gather all the weights
for (node1, node2, data) in g.edges(data=True):
all_weights.append(data['weight'])
for node in g.nodes():
x, y = pos[node]
node_trace['x'] += tuple([x])
node_trace['y'] += tuple([y])
clubs = list(df.source.unique())
if g.nodes()[node]['size'] < 10:
node_trace['marker']['size']+= tuple([g.nodes()[node]['size'] * len(clubs) *400/ sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
elif g.nodes()[node]['size'] >= 10:
node_trace['marker']['size'] += tuple([g.nodes()[node]['size'] * len(clubs) *400 / sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
node_adjacencies = []
node_text = []
node_betweenness = []
for node, adjacencies in enumerate(g.adjacency()):
node_adjacencies.append(len(adjacencies[1]))
node_text.append('# of connections: '+ str(len(adjacencies[1])))
for value in list(nx.betweenness_centrality(g).values()):
node_betweenness.append(value * 250)
for node, name in enumerate(g.nodes()):
node_text.append(name)
node_trace.marker.color = node_adjacencies
#node_trace.text = node_text
fig = go.Figure(
layout=go.Layout(
title='Network Analysis Age and Position '+ IN.name +" by Values",
titlefont_size=20,
paper_bgcolor='rgba(0,0,0,0)', # transparent background
plot_bgcolor='white', # transparent 2nd background
showlegend=False,
hovermode='closest',
margin=dict(b=20,l=5,r=5,t=40),
xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
yaxis=dict(showgrid=False, zeroline=False, showticklabels=False))
)
# Add all edge traces
for trace in edge_trace:
fig.add_trace(trace)
# Add node trace
fig.add_trace(node_trace)
fig.show()
#fig.write_html('ruolo_eta_values_IN.html')
c:\users\fap91\appdata\local\programs\python\python37\lib\site-packages\ipykernel_launcher.py:13: FutureWarning: The default value of regex will change from True to False in a future version.
From the top graph we see that the Premier League is the league with the most transfer spend for different combinations of players between age and position. In particular very young defenders and midfielders. Serie A, on the other hand, stands out for the number of purchases of older players. It is worth noting that the Bundesliga is homogeneous in terms of spending, so much so that it is not even displayed for an important link. The Liga NOS spends a lot on goalkeepers between 18-20 years old and the Liga 1 on strikers between 36 and 42 years old.
It has been decided to leave only the most important links for each individual combination to simplify the display.
Thickness according to the money exchanged.
Always acting as an advisor to a football agent or a sports director of a football club, we decided to look at the last decade 2010-2021 of the football market of the Italian professional leagues, i.e. SERIE A, SERIE B, SERIE C.
url = "Datasets/leghe_it.csv"
frame = pd.read_csv(url, error_bad_lines=False, sep=',')
frame = frame[frame.Movement == "Out"]
frame = frame.replace('FC Internazionale','Inter').replace('Parma Calcio 1913','Parma FC')
lista_unica = list(frame.Club.unique())
frame = frame[frame['ClubInvolved'].isin(lista_unica)]
frame['new_col'] = frame["Club"] +","+ frame["ClubInvolved"]
# frame = frame[frame.Club != "Calcio Catania"]
# a = pd.DataFrame(frame['new_col'].value_counts()).reset_index()
# a.columns = ['squadre', 'counts']
# b = a[a['counts'] > 20]
def get_edges(data, column):
series = data[column].dropna().apply(lambda x: x.split(","))
cross = series.apply(lambda x: list(itertools.combinations(x, 2)))
lists = [item for sublist in cross for item in sublist]
source = [i[0] for i in lists]
target = [i[1] for i in lists]
edges = pd.DataFrame({"source": source, "target": target})
edges["weight"] = 1
#return edges
return edges.groupby(by=["source", "target"], as_index=False)["weight"].sum()
df_edges_old = get_edges(data=frame, column="new_col")
df = df_edges_old[df_edges_old['weight'] > 0]
#df = df_edges[~df_edges[['source', 'target']].apply(frozenset, axis=1).duplicated()]
g = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["weight"],create_using=nx.Graph)
#a = dict(g.degree)
a = Counter(df.source)
for char in a:
if a[char] > 0:
g.add_node(char, size = a[char])
def make_edge(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='white'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge2(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='yellow'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge3(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='red'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
def make_edge4(x, y, text, width):
return go.Scatter(x=x,
y=y,
line=dict(width=width,
color='orange'),
hoverinfo='text',
text=([text]),
opacity=1,
mode='lines')
#pos = nx.kamada_kawai_layout(g)
#pos = nx.spring_layout(g)
#pos = nx.circular_layout(g)
pos = nx.spiral_layout(g)
#pos = nx.multipartite_layout(g)
#
# Position nodes in layers of straight lines.
# For each edge, make an edge_trace, append to list
edge_trace = []
for edge in g.edges():
# if g.edges()[edge]['weight'] > 1:
if g.edges()[edge]['weight'] < 5:
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge([x0, x1, None], [y0, y1, None],text,
g.edges()[edge]['weight']/50)
edge_trace.append(trace)
elif g.edges()[edge]['weight'] < 18:
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge4([x0, x1, None], [y0, y1, None],text,
g.edges()[edge]['weight']/40)
edge_trace.append(trace)
elif g.edges()[edge]['weight'] < 25:
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge2([x0, x1, None], [y0, y1, None], text,
g.edges()[edge]['weight'] / 35)
edge_trace.append(trace)
elif g.edges()[edge]['weight'] >= 25:
char_1 = edge[0]
char_2 = edge[1]
x0, y0 = pos[char_1]
x1, y1 = pos[char_2]
text = char_1 + '--' + char_2 + ': ' + str(g.edges()[edge]['weight'])
trace = make_edge3([x0, x1, None], [y0, y1, None], text,
g.edges()[edge]['weight']/20)
edge_trace.append(trace)
node_trace = go.Scatter(x=[],y=[],
mode='markers',
hoverinfo='text',
text= [],
marker=dict(
showscale=True,
# colorscale options
#'Greys' | 'YlGnBu' | 'Greens' | 'YlOrRd' | 'Bluered' | 'RdBu' |
#'Reds' | 'Blues' | 'Picnic' | 'Rainbow' | 'Portland' | 'Jet' |
#'Hot' | 'Blackbody' | 'Earth' | 'Electric' | 'Viridis' |'YlGnBu'
colorscale='Hot',
reversescale=True,
color=[],
size=[],
colorbar=dict(
thickness=15,
title='Node Connections',
xanchor='left',
titleside='right'
),
line_width=0.5))
all_weights = []
# 4 a. Iterate through the graph nodes to gather all the weights
for (node1, node2, data) in g.edges(data=True):
all_weights.append(data['weight'])
for node in g.nodes():
x, y = pos[node]
node_trace['x'] += tuple([x])
node_trace['y'] += tuple([y])
clubs = list(df.source.unique())
if g.nodes()[node]['size'] <50:
node_trace['marker']['size']+= tuple([g.nodes()[node]['size'] * len(clubs) / sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
elif g.nodes()[node]['size'] < 100:
node_trace['marker']['size'] += tuple([g.nodes()[node]['size'] * len(clubs) *5/ sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
elif g.nodes()[node]['size'] >= 100:
node_trace['marker']['size'] += tuple([g.nodes()[node]['size'] * len(clubs) * 10 / sum(all_weights)])
node_trace['text'] += tuple(['<b>' + node + '</b>'])
node_adjacencies = []
node_text = []
node_betweenness = []
for node, adjacencies in enumerate(g.adjacency()):
node_adjacencies.append(len(adjacencies[1]))
node_text.append('# of connections: '+ str(len(adjacencies[1])))
for value in list(nx.betweenness_centrality(g).values()):
node_betweenness.append(value)
node_text.append('Betweenness: ')
node_trace.marker.color = node_adjacencies
#node_trace.marker.color = node_betweenness
fig = go.Figure(
layout=go.Layout(
title='Network Graph Football Italy 2010-2021',
titlefont_size=20,
paper_bgcolor='rgba(0,0,0,0)', # transparent background
plot_bgcolor='black', # transparent 2nd background
showlegend=False,
hovermode='closest',
margin=dict(b=20,l=5,r=5,t=40),
xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
yaxis=dict(showgrid=False, zeroline=False, showticklabels=False))
)
# Add all edge traces
for trace in edge_trace:
fig.add_trace(trace)
# Add node trace
fig.add_trace(node_trace)
fig.show()
In the graph representing the football market transfers of the last 11 years, we notice that the nodes with the most transfers are: Parma, Atalanta and Chievo Verona.
We also note how the connection between Lazio and Salernitana is very marked. This is because they belong to the same management, headed by President Lotito.
There is also a very strong connection between Cagliari and Olbia, because over the last ten years the help between the two island clubs has intensified. Also because of the friendship between the two club presidents.
Among the most important links are Parma, Crotone and Gubbio.
We have got the white color for edges until 4 links, yellow from 5 to 18, orange from 18 to 25 and red from 25 to max. Node size based on number of transfers, node color based on degree.
Also on the idea of helping a football agent, it was thought to construct a possible metric that could indicate the 3 best routes to bring a player to a top club. This was thought to be implemented for both transfer quantities and monetary quantities
url = "Datasets/leghe_it.csv"
frame = pd.read_csv(url, error_bad_lines=False, sep=',')
frame = frame[frame.Movement == "Out"]
lista_unica = list(frame.Club.unique())
frame = frame[frame['ClubInvolved'].isin(lista_unica)]
frame['new_col'] = frame["Club"] +","+ frame["ClubInvolved"]
# frame = frame[frame.Club != "Calcio Catania"]
# a = pd.DataFrame(frame['new_col'].value_counts()).reset_index()
# a.columns = ['squadre', 'counts']
# b = a[a['counts'] > 20]
def get_edges(data, column):
series = data[column].dropna().apply(lambda x: x.split(","))
cross = series.apply(lambda x: list(itertools.combinations(x, 2)))
lists = [item for sublist in cross for item in sublist]
source = [i[0] for i in lists]
target = [i[1] for i in lists]
edges = pd.DataFrame({"source": source, "target": target})
edges["weight"] = 1
#return edges
return edges.groupby(by=["source", "target"], as_index=False)["weight"].sum()
df_edges_old = get_edges(data=frame, column="new_col")
df = df_edges_old[df_edges_old['weight'] > 0]
#df = df_edges[~df_edges[['source', 'target']].apply(frozenset, axis=1).duplicated()]
g = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["weight"],create_using=nx.Graph)
#++++++++ SELEZIONA SQUADRA ++++++++++
#start = 'SEF Torres 1903'
#start = 'Tharros'
start = 'Olbia Calcio 1905'
end = 'Juventus FC'
Shortest = (list(nx.all_shortest_paths(g, start, end)))
def short(prova):
pesi = []
for lista in prova:
a = 0
for j in range(0, len(lista)-1):
a += g.edges[lista[j],lista[j+1]]['weight']
pesi.append(a/(len(lista)-1))
print('I percorsi migliori in termini di probabilità per il procuratore '
'per partire dalla squadra {} \nper arrivare '
'alla squadra {} sono: \n'.format(start, end))
perc = []
values = sorted(pesi, reverse=True)[:3]
for i in range(0,3):
a = sorted(pesi, reverse=True)[i]
#max_value = max(pesi)
max_index = pesi.index(a)
print(prova[max_index])
perc.append(tuple(prova[max_index]))
diz = {'Percorsi' : perc, 'Valori': values}
data = pd.DataFrame(diz)
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15,10))
sns.set(style="dark")
ax.set_title('Probability',size=13, color='Black')
ax.tick_params(labelsize=11)
# fig.patch.set_visible(False)
sns.barplot(x='Percorsi', y="Valori", data=data, palette='YlOrRd', ax=ax)
plt.setp(ax, ylabel=(''), xlabel=(''))
plt.show()
return prova[max_index]
short(Shortest)
I percorsi migliori in termini di probabilità per il procuratore per partire dalla squadra Olbia Calcio 1905 per arrivare alla squadra Juventus FC sono: ['Olbia Calcio 1905', 'Cagliari Calcio', 'Juventus FC'] ['Olbia Calcio 1905', 'Genoa CFC', 'Juventus FC'] ['Olbia Calcio 1905', 'Atalanta', 'Juventus FC']
['Olbia Calcio 1905', 'Atalanta', 'Juventus FC']
We have a young player from Olbia who wants to join Juventus. What path could favour this dream? The highest quantitative metric of a player starting from an A team to get to a B team, calculated using the number of transfers of minimal paths. The teams in order are: Cagliari, Genoa, Atalanta.
url = "Datasets/leghe_it.csv"
frame = pd.read_csv(url, error_bad_lines=False, sep=',')
frame = frame[frame.Movement == "Out"]
df = frame
frame['Costo'] = frame['Costo'].str.replace(' mila €','000')\
.str.replace(' mln €','0000').str.replace(r'\D', '').replace(r'^\s*$', np.nan, regex=True)
frame.dropna(subset = ["Costo"], inplace=True)
new = pd.DataFrame(zip(frame.Club,frame.ClubInvolved,frame.Costo))
new.rename(columns={0:'Club', 1: 'ClubInvolved', 2: 'Costo'}, inplace=True)
new['Costo'] = new['Costo'].astype(int)
df_edges = new.groupby(by=['Club', 'ClubInvolved'], as_index=False)['Costo'].sum()
#df = df_edges[~df_edges[['Club', 'ClubInvolved']].apply(frozenset, axis=1).duplicated()]
df_edges = df_edges[df_edges.Costo > 0]
lista_unica = list(df_edges.Club.unique())
df_edges = df_edges[df_edges['ClubInvolved'].isin(lista_unica)]
g = nx.from_pandas_edgelist(df_edges, source="Club", target="ClubInvolved", edge_attr=["Costo"],create_using=nx.Graph)
start = 'Olbia Calcio 1905'
end = 'Juventus FC'
Shortest = (list(nx.all_shortest_paths(g, start, end)))
def short(prova):
pesi = []
for lista in Shortest:
a = 0
for j in range(0, len(lista)-1):
a += g.edges[lista[j],lista[j+1]]['Costo']
pesi.append(a/(len(lista)-1))
print('I percorsi migliori in termini monetari per il procuratore '
'per partire dalla squadra {} \n per arrivare '
'alla squadra {} sono: \n'.format(start, end))
perc = []
values = sorted(pesi, reverse=True)[:3]
for i in range(0, 3):
a = sorted(pesi, reverse=True)[i]
# max_value = max(pesi)
max_index = pesi.index(a)
print(prova[max_index])
perc.append(tuple(prova[max_index]))
diz = {'Percorsi': perc, 'Valori': values}
data = pd.DataFrame(diz)
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 10))
sns.set(style="dark")
ax.set_title('Money Values', size=13, color='Black')
ax.tick_params(labelsize=11)
# fig.patch.set_visible(False)
sns.barplot(x='Percorsi', y="Valori", data=data, palette='YlOrRd', ax=ax)
plt.setp(ax, ylabel=(''), xlabel=(''))
plt.show()
return prova[max_index]
short(Shortest)
c:\users\fap91\appdata\local\programs\python\python37\lib\site-packages\ipykernel_launcher.py:7: FutureWarning: The default value of regex will change from True to False in a future version.
I percorsi migliori in termini monetari per il procuratore per partire dalla squadra Olbia Calcio 1905 per arrivare alla squadra Juventus FC sono: ['Olbia Calcio 1905', 'Genoa CFC', 'Juventus FC'] ['Olbia Calcio 1905', 'Cagliari Calcio', 'Juventus FC'] ['Olbia Calcio 1905', 'Inter', 'Juventus FC']
['Olbia Calcio 1905', 'Inter', 'Juventus FC']
We have a young player from Olbia who wants to join Juventus. What path would a prosecutor prefer according to a logic of monetary exchange value? Transfers with a higher economic value bring more income in terms of % to the agents. That is why they would also be interested in using this metric.
The idea was to advise a player's agent by showing him or her how the chances of a player of nationality X reaching a top European club vary (for the sake of simplicity only certain clubs and nationalities will be used in the analysis).
History teaches us that certain clubs are more likely to select players of certain nationalities). This factor if properly used could facilitate buying and selling.
url = "Datasets/leghe_euro_1020_INOUT.csv"
frame = pd.read_csv(url, error_bad_lines=False, sep=',')
frame = frame[frame.Movement == "Out"]
frame = frame.replace('FC Internazionale','Inter').replace('Parma Calcio 1913','Parma FC')
lista_club = ['Inter', 'Juventus FC', 'Cagliari Calcio','AC Milan']
lista_naz = ['Brasile', 'Argentina', 'Inghilterra', 'Francia', 'Spagna','Uruguay']
frame = frame[frame['Club'].isin(lista_club)]
frame = frame[frame['Nationality'].isin(lista_naz)]
frame['new_col'] = frame["Club"] + "," + frame["Nationality"]
def get_edges(data, column):
series = data[column].dropna().apply(lambda x: x.split(","))
cross = series.apply(lambda x: list(itertools.combinations(x, 2)))
lists = [item for sublist in cross for item in sublist]
source = [i[0] for i in lists]
target = [i[1] for i in lists]
edges = pd.DataFrame({"source": source, "target": target})
edges["weight"] = 1
# return edges
return edges.groupby(by=["source", "target"], as_index=False)["weight"].sum()
df_edges_old = get_edges(data=frame, column="new_col")
df = df_edges_old[df_edges_old['weight'] > 0]
# df = df_edges[~df_edges[['source', 'target']].apply(frozenset, axis=1).duplicated()]
B = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["weight"], create_using=nx.Graph)
B.add_nodes_from(frame['Club'], bipartite=0)
B.add_nodes_from(frame['Nationality'], bipartite=1)
d = dict(Counter(frame.Club))
a = dict(Counter(frame.Nationality))
clubs = list(df.source.unique())
clubsinv = list(df.target.unique())
plt.figure(figsize=(40, 30))
# # 3. Draw the parts we want
top = nx.bipartite.sets(B)[0]
pos = nx.bipartite_layout(B, top)
club_size = [B.degree(club)*30 for club in clubs]
k = list(d.keys())
v = list(d.values())
lista = list(a.keys())
pesi = list(a.values())
# Draw EVERYONE
nx.draw_networkx_nodes(B, pos, nodelist=k, node_color='yellow', node_size=[team*10 for team in v])
nx.draw_networkx_nodes(B, pos, nodelist=lista, node_color='red', node_size=[peso*10 for peso in pesi])
# nx.draw_networkx_nodes(B, pos, nodelist=clubsinv, node_color='red', node_size=200)
# nx.draw(B, nodelist=d.keys(), node_size=[v * 100 for v in d.values()])
# Draw POPULAR clubs
popular_clubsinv = [clubinv for clubinv in clubsinv if B.degree(clubinv) > 20]
edges,weights = zip(*nx.get_edge_attributes(B,'weight').items())
all_weights = []
# 4 a. Iterate through the graph nodes to gather all the weights
for (node1, node2, data) in B.edges(data=True):
all_weights.append(data['weight']) # we'll use this when determining edge thickness
# 4 b. Get unique weights
unique_weights = list(set(all_weights))
# 4 c. Plot the edges - one by one!
for weight in unique_weights:
# 4 d. Form a filtered list with just the weight you want to draw
weighted_edges = [(node1, node2) for (node1, node2, edge_attr) in B.edges(data=True) if
edge_attr['weight'] == weight]
width = weight * len(clubs)*13 / sum(all_weights)
if weight > 20:
nx.draw_networkx_edges(B, pos, edgelist=weighted_edges,
width=weight * len(clubs)*20 / sum(all_weights), edge_color= "lightblue")
# elif weight > 15:
# nx.draw_networkx_edges(B, pos, edgelist=weighted_edges,
# width=width, edge_color="blue")
nx.draw_networkx_edges(B, pos, edgelist=weighted_edges,
width=width, edge_color="lightblue")
node_labels = dict(zip(clubs,popular_clubsinv))
nx.draw_networkx_labels(B, pos, font_size=30)
plt.axis('off')
plt.title("Bipartite Network Team Nationality 2010 2020")
plt.show()
In our example, we looked at 4 teams Inter, Milan, Juventus, and Cagliari Calcio, all teams in the Italian Serie A.
And 6, from the most frequent countries of origin: France, Argentina, Brazil, Uruguay, England, and Spain.
We can see that Brazil is a source of players to bring to Italy, and the team that has relied most on Brazilian players is Inter and Milan. Also, many Argentine players are bought by clubs located in Milan. We also note that England (the country where football was born) has no particular match in the top Italian clubs and Cagliari.
We also note that Inter has not bought any Spanish players in recent years. From this analysis, we can see that South American players are more sought after in Italy.
As mentioned above, we have only taken 4 teams for the sake of simplicity, it would be interesting to extend the analysis to all teams and all other leagues. It would be interesting to extend the analysis to all teams and all other leagues to observe the diversity according to the nationality of the player.
Thanks to the analysis carried out we have seen that in the last 10 years, the Premier League has come to play a central role, both in the number of transfers and in the monetary amounts exchanged. It is also the league that focuses more on young players in every role of the field.
Serie A remains a very active league as far as player exchanges are concerned, with a prevalence of age groups over 30 compared to the others. This means that the monetary value of transfers is lower because after this age the player tends to lose value.
The Liga NOS during the last 20 years has grown a lot in terms of number of transfers with particular attention to those of transfers with particular attention to those in the age group 21-28. An agent would find it easier to bring a player to Portugal if the player is in mid-career or has a good career, if he is in mid-career or needs a second chance.
La Liga is a less active league as far as the number of transfers is concerned, while it has been noticed that it spends more on players than on the rest of the league. Noted that it spends on established players of each role, also explained by the presence of two clubs to which each player is assigned. This is also explained by the presence of two clubs to which each player aspires, Real Madrid and Barcelona.
Bundesliga and Ligue 1 are leagues where no particular significance has been found particular significance as they are much less active than the other leagues.
As far as the Italian series is concerned, it was easy to observe that the teams that had more transfers or a higher beetwenness were not those considered top clubs.
The metrics we have created can also be useful to a procurator to understand which is the best path for the player to follow and which path offers the best commissions to the attorney himself. All this if properly combined or supported by a knowledge of the sporting leaderships of the various societies.
There are many studies on network analysis for individual match data, but not for market transfers. This study provides a foundation for future work that can be explored further on other aspects.
Through the integration of paid data from platforms such as WyScout or Infront Sports & Media, the work that has been done could be expanded and integrated with data from players on the field.
This project, has examined only 6 leagues of a continent, for a future study it would be interesting to show also the leagues not taken into account, or make the focus for the various national teams.
The soccer market is a broad topic that could be explored more. We close
this analysis with the regret that it was not possible to analyze the data of the
in fact, there is currently only one site in German that contains partial data
contains partial data on the transfer flows of the German women's league.
For the help in deepening the topics.
⚽🌐
Maggio 2021 - Cagliari